How do patterns become predictions and data transform into decisions? Just as we learn from experience, AI models learn from examples. The key difference is that while we might need just a few examples to understand a concept, these models often need thousands or, in some cases, millions of examples to learn effectively. Why?

AI models use math to build an understanding of the world. From the examples we provide, they discover patterns and help us make sense of data, find hidden relationships, predict future outcomes, and make informed decisions. Without these models, we’d be drowning in numbers, words, and images with no way to connect the dots.

Here, we will explore various algorithms, starting with the classic linear models – the simplest of them all. These models learn to draw lines through data points, much like you might have done in your high school math class. However, they find these lines automatically from examples. From there, we’ll discover decision trees that make yes/ no choices and clustering algorithms that learn to group similar things together. Don’t worry if these terms sound complex now. We’ll break down how each model learns and why it’s useful.

Soon you’ll have a solid understanding of these models, their applications, and how they bring us one step closer to making AI work for us. Despite the excitement around deep learning and large language models, these fundamental algorithms remain invaluable. They’re faster to train, easier to interpret, and more efficient for many day-to-day problems, whether you’re forecasting sales or detecting suspicious transactions. You don’t need a flamethrower to light a candle, after all! So, let’s roll up our sleeves and dive into the fascinating world of machine learning models.

Linear models

Imagine you’re a pirate holding a treasure map with a scattering of dots. Each dot marks where previous treasure hunters found gold. Some found more, and some found less. Your task is to find the pattern of where the treasure tends to be found. Once you draw a line through these dots, you can predict how much gold might be buried at any spot on the map, even where no one has dug before. Congratulations! You’ve just visualised a simple linear model.

In the context of AI, we’re looking for different kinds of patterns. The dots are our data points, and instead of connecting them like a path, we draw a line that best represents their overall trend. This line becomes the model’s way of understanding the relationship in our data. Just as your “treasure line” might help predict the value of treasure based on location, a linear model uses its line to predict new, unseen cases. These models look for patterns that can be described by a straight line or its higher-dimensional cousins. When we feed in information (called independent variables), the model uses its line to make predictions (called dependent variables). This simple idea is surprisingly powerful. The line acts as a mathematical summary of the relationship between our inputs and outputs. To understand how these models find the “best” line automatically, let’s look at our first algorithm, Linear Regression.

Linear regression

Imagine you’re a real estate agent and need to predict the selling price for a house.2 You know that larger houses generally cost more than smaller ones. But how much more? You have various facts and figures about the house, like its size in square feet, the number of rooms, the amenities, and the neighbourhood. Rather than guessing, you could look at recent house sales in your area and try to find a pattern between these features and the price. This is exactly what linear regression helps us do. Each of these inputs, like size and square footage, is an independent variable. The dependent variable would be the house’s selling price because that’s what we are trying to predict based on the house’s features.

Before we dive in, let’s understand what “linear” means. A linear relationship is one where changes in one variable lead to proportional changes in another. For example, if you’re buying tomatoes at $10 per pound, each additional pound adds exactly $10 to your total cost. This creates a straight-line pattern when graphed. However, not all relationships are linear. For example, the area of a square increases by four times when doubling the length of its side, not two. So, linear regression works best when we believe there’s a roughly straight-line relationship between our variables.

How does it work? Suppose each house becomes a point on a graph, with square footage on one axis and price on the other. Our goal is to find a line that best represents this data. This line has two important components: the bias (also called the intercept) and the slope. The intercept tells us where the line starts. Think of it as the baseline price you’d expect for any house, regardless of its features. The slope tells us how much the price changes when we increase the square footage. Together, these are called coefficients, and finding the right values for them is key to making good predictions.

If house prices were perfectly linear with size, all these points would fall exactly on a straight line. But in the real world, data is messy. Two houses of the same size might sell for different prices due to location, upgrades done to them, or market timing.

This is where the “regression” part of linear regression comes in. Instead of trying to connect all points perfectly, we look for a line that best captures the overall trend. But what makes a line best? For each house in our data, we can measure how far off our line’s predicted price is from the actual price. These differences are called residuals. A positive residual means our line predicted the price too low. A negative residual means it guessed too high.

We could try to minimise these residuals, but there’s a catch. If we simply add up all residuals, positive and negative values would cancel each other out, tricking us into thinking we have a good line when we don’t. Instead, we square each residual before adding them up. This approach, called the “least squares method,” has two clever effects. It makes all residuals positive (so they can’t cancel out), and it penalises large errors more heavily than small ones. After all, being off by $100,000 in a house price prediction is worse than being off by $10,000.

So, while there might be different lines that can pass through our points, only one of them minimises the overall error in residuals. That is the solution that the least squares method finds. There are different ways to find the best coefficients and bias terms. The least squares method is one approach. Another important method, gradient descent, that can be used for this task will be covered in the next chapter. Both methods aim to minimise prediction errors, just in different ways.

In practice, predicting house prices isn’t just about square footage. We might also consider the number of bedrooms, the age of the house, the location, and many other factors. Each factor becomes another dimension in our analysis. While we can’t visualise these higher dimensions easily, the same principles apply – we’re still looking for the best way to predict prices based on all these features together.

Once we’ve found our best fit, we can use it to make predictions. For a new house, we’d plug its features (the size, number of rooms, etc.) into our equation and predict a price.

But how do we know our model is any good? To understand if our predictions are reliable, we need ways to measure our model’s performance. That’s where evaluation metrics come in.

Excerpted with permission from AI for the Rest of Us: An Illustrated Introduction, Sairam Sundaresan, Bloomsbury India.